Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison with Directly Coded C Implementation

نویسندگان

  • Iko Pramudiono
  • Takahiko Shintani
  • Takayuki Tamura
  • Masaru Kitsuregawa
چکیده

Data mining is becoming increasingly important since the size of databases grows even larger and the need to explore hidden rules from the databases becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will de nitely ease implementation of data mining. However the performance of SQL based data mining is known to fall behind specialized implementation. In this paper we present an evaluation of parallel SQL based data mining on large scale PC cluster. The performance achieved by parallelizing SQL query for mining association rule using 4 processing nodes is even with C based program.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster

One of the most important problems in data mining is discovery of association rules in large database. We had proposed parallel algorithms for mining generalized association rules with classi cation hierarchy. In this paper, we implemented the proposed algorithms on a large scale PC cluster which consists of one hundred PCs interconnected by an ATM switch, and analyzed the performance of our al...

متن کامل

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot provide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a ...

متن کامل

DWMiner: A Tool for Mining Frequent Item Sets Efficiently in Data Warehouses

This work presents DWMiner, an association rules efficient mining tool to process data directly over a relational DBMS data warehouse. DWMiner executes the Apriori algorithm as SQL queries in parallel, using a database PC Cluster middleware developed for SQL query optimization in OLAP applications. DWMiner combines intraand inter-query parallelism in order to reduce the total time needed to fin...

متن کامل

Parallel Optimized Algorithm for Apriori Association Rule Mining on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently .Now GPU(Graphics Processor Unit) has taken a major role in high performance computing for general purpose applications. Compute Unified Device Architecture (CUDA) programm...

متن کامل

SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE)

Data mining is becoming increasingly important since the size of databases grows even larger and the need to explore hidden rules from the databases becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will definitely ease implementation of data mining. However the performance of SQL based da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999